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Method and Apparatus for describing sound sources 



5 



The invention relates to a method and to an apparatus for 
describing sound sources, especially for sound sources 
encoded as audio objects according to the MPEG-4 Audio stan 
dard. 



10 Background 

The MPEG-4 Audio standard as defined in ISO/IEC '1449S-3 and 
14496-1 facilitates a wide variety of applications by sup- 
porting the representation of audio objects. For the combi- 
is nation of the audio objects additional information - the so- 
called- scene description - determines the placement in space 
and time and is transmitted together with the coded audio 
ob j ects . 

20 For playback the audio objects are decoded separately and 
composed using. the scene description in order to prepare a 
single soundtrack, which is then played to the listener. 

For efficiency, the MPEG-4 Systems standard . ISO/IEC 14496-1 
2S defines a way to encode the scene description in a binary 
representation, the so-called Binary Format for Scene De- 
scription (B1FS) . Correspondingly, audio scenes are de- - 
scribed using so-called AudioBlFS . 

3 0 A scene description is structured hierarchically and can be 
represented as a graph, wherein leaf -nodes of the graph form 
the separate objects and the other nodes describes the proc- 
essing, e.g. positioning, scaling, effects etc.. The appear- 
ance and behavior of the separate objects can be controlled 

35 using parameters within the scene description nodes. 
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Invention 

■ 

The invention is based on the recognition of the following 
fact. Currently the MPEG-4 Audio standard cannot describe 
5 sound sources that have a certain dimension/ like a choir, 
orchestra, sea or rain but only a point source, e.g. a fly- 
ing insect, or a single instrument. According to listening 
tests wideness of sound sources is clearly audible, Whereby 
more complicate descriptions like the shape of the audio 
io object is not necessary. 

Therefore, a problem to be solved by the invention is to al- 
low the description of the wideness of sound sources that 
have a certain dimension in a simple and backwards 
is compatible way. 

This problem is solved by the method disclosed in claim 1 
and the corresponding^apparatus in claim 5; 

20 In principle, the inventive method allows to describe sound 
sources, which are encoded as separate audio objects^ The 
arrangement of the sound sources in a sound scene is de- 
scribed by a scene description. For playback the audio ob~ . 
jects are decoded separately and a single soundtrack is com- 
.25 posed from the decoded audio objects using said scene de- 
scription. For describing the wideness of a sound source an 
audio spatial dif fuseness node is' defined within the scene 
description- 

3 0 Advantageous additional ■ embodiments of the invention are 
disclosed in the respective dependent claims. 



35 



Drawings 

Exemplary embodiments of the invention are described with 
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reference to the accompanying drawings, which show in 

Fig. l the illustration of the functionality of the 

AudioSpatialDiffuseness mode; 

Pig. 2 an Audio Scene for a Line Sound Source; 

Fig. 3 an exemplary scene with a combination of shapes 

to represent more complex audio source. 

■ _ 



Exemplary embodiments 

Figure 1 shows an illustration of the functionality of the 
is inventive AudioSpatialDiffuseness node, in the following 
also named AudioDif fusenes node. r 

i 

This AudioSpatialDiffuseness node will have a children field 
as input and will produce the same number of channels (mirn- 
Chan) as output. Branches that are connected to an upper 
level branch are called children in MPEG-4 terms. It can be 
inserted in each branch of the audio subtree, without chang- 
ing any other node. • ' 

A diffuseSelection field will allow the scene author to con- 
trol the diffuseness algorithms, so that each AudioSpa- 
tialDiffuseness node will produce a different output. In 
practice a diffuseness node will virtual produce N different- 
signals, but only one real signal is passed through to the 
output of the node, signaled by the diffuseSelect field. 
Other fields like a decorrelation strength (decorrStrength) 
etc. could be added to the node, if required. 
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AudioSpatialDif fuseness { 

event in MPNode addChildren 
eventin MPNode removeChildren 
exposedField MPNode children [ ] 

exposedField SFint32 diffuseSelect -1 
exposedField SPInt32 decorreStrength l 
field SFInt32 numChan i 

field MFInt32 phaseGroup [ ] 

} 



Table 1: Semantics of the proposed AudioSpatialDif f us 
Node 

in the case of numChan greater than one each channel should 
he diffused separately. 



Figure 2 depicts an Audio Scene for a Line Sound Source. By 
using this proposal the scene author has to decide how many 

20 and at which position the decorrelated multiple point sound 
sources will be located. The advantage is, that the content 
author has much more control over the shape effect. He can 
also use intensity and direction of each point source as 
well as using the AudioDelay and AudioEffects node for cer- 

25 tain Sound nodes to manipulate the effect. 

It. is still possible for the renderer to reduce the computa- 
tional power by passing the scene tree to look for identical 
AudioSour-ces . 

30 



# Example of a line sound source replaced by three point 
sources 

# using one single decoder output. 

Group { 

children [ 
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DEP POS1 Sound ~J " 
intensity 0.9 
location 0 0 0 
spatial ize TRUE 

source AudioSpatxalDif fuseness 
numChan 1 

diffuseSelect 1 
children [ 

DEP BEACH AudioSource { 
numChan 1 
url 100 

. } 



{ 



] 



} 



DEF POS2 Sound { 
intensity 0*8 
location -3 0 0 
spatialize. TRUE 

source AudioSpatialDif fuseness { 

numChan 1 
diffuseSelect 2 
children f USE BEACH] 

} 

DEP POS3 Sound { 
intensity 0.8 
location 3 0 0 
spatialize TRUE 

source AudioSpatialDif fuseneBS { 

numChan 1 
diffuseSelect 3 
children [ USE BEACH] 



} 
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Table 2 s Example of a Line Sound Source replaced by 

three Point Sources using one single Audio- 
Source . 

According to a further embodiment primitive shapes are de- 
fined and combined using the AudioSpatialDif fuseness nodes 
to do more complex shapes. An advantageous selection of 
shapes is e.g. a box, a sphere and a cylinder-. All of these 
nodes should have a location . field, a size and a rotation, 
as shown in table 3 . 



SoundBox / SowzdSphere / SavndCylind&T { 






event in MFNode addchildren 






event in MFNode removeChildren 


* 




exposedField 


MFNode children 






exposedField 


MFFloat intensity 


1.0 




exposedField 


SFVec3f location . 


0,0, 


0 


exposedField 


SFVec3f siase 


2,2, 


2 


exposedPield 


SFVec3f rofcationaxis 


0,0, 


1 


exposedField 

} 

** 


MFFloat rotationangle 


0.0 





Table 3 



If one sise parameter is set to zero a volume will be flat, 
resulting in a wall or a disk. If two dimensions are zero a 
line results. 

Fig t 3 shows a scene with two audio sources, a choir (or or- 
chestra) located in front of a listener L and audience to 
the left, right and back of the listener making applause. 
The choir consists out of one SoundSpJEzere C and the audience 
consists out of three SotmdBoxesi Al, A2, and A3 connected 
with AudioDl£fu&eiie3£t nodes. 

A BIFS example for the scene of figure 3 looks as shown in 
table 4 . 
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## The Choir SoundSphere 



1.5 



} 



{ 

location 0.0 0.0 -7.0 
size 3.0 0.6 1.5 

intensity 0.9 
spatialize TRUE 
children [ 

numChan 1 



url 1 



}3 



# 7 meter to the back 
# wide 3; height 0.6; depth 



{ 



## The audience consists out of 3 SoundBoxes 



S.O 



} 



s.o 



{ 

location -3.5 0.0 2 
size 2.0*0.5 6.0 



.0 



# SoundBox/to the left 

#3.5 meter to the left 

# wide 2; height 0.5; depth 



intensity 0.9 
spatialize TRUE 
source AudioDif fusenes 
diffuseSelect 1 



{ 



decorrStrength i.o 
children f DBF APPLAUSE 



{ 



numChan 1 
url 2 



}] 



} 



{ 

location 3.5 0.0 2.0 
size 2.0 0,5 6.0 



# SoundBox to the rigth 
#3.5 meter to the right 

# wide 2; height 0.5; depth 
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int ens ity 07b " 

spatialize TRUE 

source AudioDif fusenes{ 

dif fuseSelect 2 

decorr Strength l.o 

children [ DS£J APPLAUSE ] 

} 

} 

{ # SoundBox in the middle 

location 0.0 0.0 0.0 # 3.5 meter to the . right 
Size 5.0 0-5 2.0 # W ide 2; height 0.5; depth 

direction 0.0 0.0 0.0 l.o # default 
intensity 0.9 
spatial ise TRUE 
source AudioDif fusenes{ 
dif fuseSelect 3 
decorrStrength 1.0 
children [ USE APPLAUSE J 

} 



} 



Table 4 



In this example a children field APPLAUSE is defined as an 
audio source for the first SoundBox and is reused as audio 
source for the second and third SoundBox. Furthermore, in 
this case the dif fuseSelect field signals for the respective 
SoundBox which of the signals is passed through to the out- 
put . 

In the case of a 2D scene it is still assumed that the sound 
will be 3D. Therefore it ± 3 proposed to use a second set of 
SoundVolume nodes, where the z-axis is replaced by a single 
float field with the name Mepth' as shown in table 5. 



EmPf anssze i t 4 . Ma r z 15:16 

J 



PD02 010 OA-Ri - 04 03 03 



10 



SoundBox2D / SoundSphere2D / SoundCyl±ndB*2D { 

event in MPNode addChildren 
eventin MPNode removeChildren 
exposedField MPNode children 



} 



exposedField 
exposedField 
exposedField 
exposedField 
exposedField 



MFFloat intensity 
SFVec2f location 
SFFloat locationdepth. 
SFVec2f size 
SFFloat sizedepth 
SFVec2f rotationaxis 
exposedField SFFloat 

exposedField MFFloat rotationangle 



t ] 



1.0 



0,0 
0 

2,2 



0,0 



0.0 



is Table 5 
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Claims 

1. Method for describing sound sources, which are encoded as 
separate audio objects, wherein the arrangement of the 
sound sources in a sound scene is described by a scene 
description, and wherein for playback the audio objects 
are decoded separately and a single soundtrack is com- 
posed from the decoded audio objects using said scene de- 
scription, characterized by an audio dif fuseness node 
which is defined within the scene description for 
describing the wideness of a sound source. 

2. Method according to claim 1, wherein a diffuse selection 
field will allow the scene author to control the 

dif fuseness algorithms. 

3 . Method according to claim X or 2 , wherein a decorrelation 
strength field will allow author to control the strenght 
of the decorrelation . 

4„ Method according to any of claims l to 3, wherein shapes 
are defined and combined using the AudioSpatialDif fuse- 
ness nodes to do more complex: shapes. 

5. Apparatus for performing a method according to any of 
claims 1 to 4. 
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Abstract 



The MPEG-4 Audio standard as defined in ISO/IBC 14496-1 and 
-3 facilitates a wide variety of applications by supporting 
the representation of audio objects. For the combination of 
the audio objects additional information - the so-called 
scene description - determines the placement in space and 
time and is transmitted together with the coded audio ob- 
jects. 

For playback the audio objects are decoded separately and 
composed using the scene description in order to prepare a 
single soundtrack, which is then played to the listener. A 
scene description is structured hierarchically and can be 
represented as a graph, wherein nodes of the graph form the 
separate objects. The appearance and behaviour of the sepa- 
rate objects can be controlled using parameters within the 
scene description nodes. For describing the wideness of a 
sound source an audio diffuseness node 'is defined within 
the scene description. 

Fig. l 
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